Exploring Morphosyntactic Annotation over a Spanish Corpus for Dependency Parsing
نویسندگان
چکیده
It has been observed that the inclusion of morphosyntactic information in dependency treebanks is crucial to obtain high results in dependency parsing for some languages. In this paper we explore in depth to what extent it is useful to include morphological features, and the impact of diverse morphosyntactic annotations on statistical dependency parsing of Spanish. For this, we give a detailed analysis of the results of over 80 experiments performed with MaltParser through the application of MaltOptimizer. Our goal is to isolate configurations of morphosyntactic features which would allow for optimizing the parsing of Spanish texts, and to evaluate the impact that each feature has, independently and in combination with others.
منابع مشابه
Morphosyntactic annotation of CHILDES transcripts.
Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of label...
متن کاملHigh-accuracy Annotation and Parsing of CHILDES Transcripts
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To d...
متن کاملConstruction Grammar Based Annotation Framework for Parsing Tamil
Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammat...
متن کاملAutomatic Adaptation of Annotation Standards for Dependency Parsing ? Using Projected Treebank as Source Corpus
We describe for dependency parsing an annotation adaptation strategy, which can automatically transfer the knowledge from a source corpus with a different annotation standard to the desired target parser, with the supervision by a target corpus annotated in the desired standard. Furthermore, instead of a hand-annotated one, a projected treebank derived from a bilingual corpus is used as the sou...
متن کاملThe SETimes.HR Linguistically Annotated Corpus of Croatian
We present SETIMES.HR— the first linguistically annotated corpus of Croatian that is freely available for all purposes. The corpus is built on top of the SETIMES parallel corpus of nine Southeast European languages and English. It is manually annotated for lemmas, morphosyntactic tags, named entities and dependency syntax. We couple the corpus with domain-sensitive test sets for Croatian and Se...
متن کامل